13 research outputs found

    Dissipation range of the energy spectrum in high Reynolds number turbulence

    No full text
    We seek to understand the kinetic energy spectrum in the dissipation range of fully developed turbulence. The data are obtained by direct numerical simulations (DNS) of forced Navier-Stokes equations in a periodic domain, for Taylor-scale Reynolds numbers up to R-lambda = 650, with excellent small-scale resolution of k(max) eta approximate to 6, and additionally at R-lambda = 1300 with k(max) eta approximate to 3, where k(max) is the maximum resolved wave number and eta is the Kolmogorov length scale. We find that for a limited range of wave numbers k past the bottleneck, in the range 0.15 less than or similar to k eta > 1 where analytical arguments as well as DNS data with superfine resolution [S. Khurshid et al., Phys. Rev. Fluids 3, 082601 (2018)] suggest a simple exp(-k eta) dependence. We briefly discuss our results in connection to the multifractal model

    A highly scalable particle tracking algorithm using partitioned global address space (PGAS) programming for extreme-scale turbulence simulations

    No full text
    A new parallel algorithm utilizing a partitioned global address space (PGAS) programming model to achieve high scalability is reported for particle tracking in direct numerical simulations of turbulent fluid flow. The work is motivated by the desire to obtain Lagrangian information necessary for the study of turbulent dispersion at the largest problem sizes feasible on current and next-generation multi-petaflop supercomputers. A large population of fluid particles is distributed among parallel processes dynamically, based on instantaneous particle positions such that all of the interpolation information needed for each particle is available either locally on its host process or neighboring processes holding adjacent sub domains of the velocity field. With cubic splines as the preferred interpolation method, the new algorithm is designed to minimize the need for communication, by transferring between adjacent processes only those spline coefficients determined to be necessary for specific particles. This transfer is implemented very efficiently as a one-sided communication, using Co-Array Fortran (CAF) features which facilitate small data movements between different local partitions of a large global array. The cost of monitoring transfer of particle properties between adjacent processes for particles migrating across sub-domain boundaries is found to be small. Detailed benchmarks are obtained on the Cray petascale supercomputer Blue Waters at the University of Illinois, Urbana-Champaign. For operations on the particles in a 81923 simulation (0.55 trillion grid points) on 262,144 Cray XE6 cores, the new algorithm is found to be orders of magnitude faster relative to a prior algorithm in which each particle is tracked by the same parallel process at all times. This large speedup reduces the additional cost of tracking of order 300 million particles to just over 50% of the cost of computing the Eulerian velocity field at this scale. Improving support of PGAS models on major compilers suggests that this algorithm will be of wider applicability on most upcoming supercomputers

    A dual communicator and dual grid-resolution algorithm for petascale simulations of turbulent mixing at high Schmidt number

    No full text
    A new dual-communicator algorithm with very favorable performance characteristics has been developed for direct numerical simulation (DNS) of turbulent mixing of a passive scalar governed by an advection-diffusion equation. We focus on the regime of high Schmidt number (Sc), where because of low molecular diffusivity the grid-resolution requirements for the scalar field are stricter than those for the velocity field by a factor root Sc. Computational throughput is improved by simulating the velocity field on a coarse grid of N-v(3) points with a Fourier pseudo-spectral (FPS) method, while the passive scalar is simulated on a fine grid of N-theta(3) points with a combined compact finite difference (CCD) scheme which computes first and second derivatives at eighth-order accuracy. A static three-dimensional domain decomposition and a parallel solution algorithm for the CCD scheme are used to avoid the heavy communication cost of memory transposes. A kernel is used to evaluate several approaches to optimize the performance of the CCD routines, which account for 60% of the overall simulation cost. On the petascale supercomputer Blue Waters at the University of Illinois, Urbana-Champaign, scalability is improved substantially with a hybrid MPI-OpenMP approach in which a dedicated thread per NUMA domain overlaps communication calls with computational tasks performed by a separate team of threads spawned using OpenMP nested parallelism. At a target production problem size of 8192(3) (0.5 trillion) grid points on 262,144 cores, CCD timings are reduced by 34% compared to a pure-MPI implementation. Timings for 16384(3) (4 trillion) grid points on 524,288 cores encouragingly maintain scalability greater than 90%, although the wall clock time is too high for production runs at this size. Performance monitoring with CrayPat for problem sizes up to 4096(3) shows that the CCD routines can achieve nearly 6% of the peak flop rate. The new DNS code is built upon two existing FPS and CCD codes. With the grid ratio N-theta/N-v = 8, the disparity in the computational requirements for the velocity and scalar problems is addressed by splitting the global communicator MPI_COMM_WORLD into disjoint communicators for the velocity and scalar fields, respectively. Inter communicator transfer of the velocity field from the velocity communicator to the scalar communicator is handled with discrete send and non-blocking receive calls, which are overlapped with other operations on the scalar communicator. For production simulations at N-theta = 8192 and N-v = 1024 on 262,144 cores for the scalar field, the DNS code achieves 94% strong scaling relative to 65,536 cores and 92% weak scaling relative to N-theta = 1024 and Nv = 128 on 512 cores

    GPU acceleration of a petascale application for turbulent mixing at high Schmidt number using OpenMP 4.5

    No full text
    This paper reports on the successful implementation of a massively parallel GPU-accelerated algorithm for the direct numerical simulation of turbulent mixing at high Schmidt number. The work stems from a recent development (Comput. Phys. Commun., vol. 219, 2017, 313-328), in which a low-communication algorithm was shown to attain high degrees of scalability on the Cray XE6 architecture when overlapping communication and computation via dedicated communication threads. An even higher level of performance has now been achieved using OpenMP 4.5 on the Cray XK7 architecture, where on each node the 16 integer cores of an AMD Interlagos processor share a single Nvidia K20X GPU accelerator. In the new algorithm, data movements are minimized by performing virtually all of the intensive scalar field computations in the form of combined compact finite difference (CCD) operations on the GPUs. A memory layout in departure from usual practices is found to provide much better performance for a specific kernel required to apply the CCD scheme. Asynchronous execution enabled by adding the OpenMP 4.5 NOWAIT clause to TARGET constructs improves scalability when used to overlap computation on the GPUs with computation and communication on the CPUs. On the 27-petaflops supercomputer Titan at Oak Ridge National Laboratory, USA, a GPU-to-CPU speedup factor of approximately 5 is consistently observed at the largest problem size of 81923 grid points for the scalar field computed with 8192 XK7 nodes

    Is vortex stretching the main cause of the turbulent energy cascade?

    No full text
    In three dimensional turbulence there is on average a cascade of kinetic energy from the largest to the smallest scales of the flow. While the dominant idea is that the cascade occurs through the physical process of vortex stretching, evidence for this is debated. In the framework of the Karman-Howarth equation for the two point turbulent kinetic energy, we derive a new result for the average flux of kinetic energy between two points in the flow that reveals the role of vortex stretching. However, the result shows that vortex stretching is in fact not the main contributor to the average energy cascade; the main contributor is the self-amplification of the strain-rate field. We emphasize the need to correctly distinguish and not conflate the roles of vortex stretching and strain-self amplification in order to correctly understand the physics of the cascade, and also resolve a paradox regarding the differing role of vortex stretching on the mechanisms of the energy cascade and energy dissipation rate. Direct numerical simulations are used to confirm the results, as well as provide further results and insights on vortex stretching and strain-self amplification at different scales in the flow. Interestingly, the results imply that while vortex stretching plays a sub-leading role in the average cascade, it may play a leading order role during large fluctuations of the energy cascade about its average behavior
    corecore